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Abstract. When searching for a new resonance somewhere in a possible mass range, 
the significance of observing a local excess of events must take into account the prob- 
ability of observing such an excess anywhere in the range. This is the so called "look 
elsewhere effect". The effect can be quantified in terms of a trial factor, which is the 
ratio between the probability of observing the excess at some fixed mass point, to 
the probability of observing it anywhere in the range. We propose a simple and fast 
procedure for estimating the trial factor, based on earlier results by Davies. We show 
that asymptotically, the trial factor grows linearly with the (fixed mass) significance. 



1 Introduction 

The statistical significance that is associated to 
the observation of new phenomena is usually ex- 
pressed using a p-value, that is, the probability 
that a similar or more extreme effect would be 
seen when the signal does not exist (a situation 
usually referred to as the null or background- 
only hypothesis). It is often the case that one 
does not a priori know where the signal will ap- 
pear within some possible range. In that case, 
the significance calculation must take into ac- 
count the fact that an excess of events anywhere 
in the range could equally be considered as a sig- 
nal. This is known as the "look elsewhere effect" 
[I] [2] . In the statistical literature this situation is 
usually referred to as an hypothesis test when a 
nuisance parameter is present only under the al- 
ternative, for which the standard regularity con- 
ditions do not apply. The problem is however 
closely related to that of a level crossings of a 
stochastic process, which has been studied exten- 
sively and for which many relevant results exist 



[3] , [4] . In particular j4 a provides an upper bound 
on the tail probability of the maximum of a chi- 
squared process, which can therefore be applied 
to cases when the test statistic is the profile like- 
lihood ratio and the large-sample distribution at 
a fixed point given by Wilks' theorem [5] holds. 

Of course, a straightforward way of quanti- 
fying the look-elsewhere effect may be simply 
running many Monte-Carlo simulations of back- 
ground only experiments, and finding for each 
one the largest fluctuation that resembles a sig- 
naQ. Examples for such an approach can be found 
e.g. in [5J. While this procedure is simple and 
gives the correct answer, it is also time and CPU 
consuming, as one would have to repeat it O(10 7 ) 
times to get the p-value corresponding to a 5er 
significance. Approximations based on large sam- 
ple limits such as [I] can therefore be valuable. 

To make the discussion concrete, let us first 
describe a typical example of a 'mass bump' search. 
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1 It is assumed that the signal can appear only in 
one location, therefore when a search is conducted, 
one will look for the largest excess of events and 
regard all others as background fluctuations. 
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The statistical model in this case consists of a 
background distribution b, and a signal distri- 
bution s(m) with an unknown location param- 
eter m (corresponding to the mass of a reso- 
nance). We introduce a signal strength parame- 
ter fi, such that the model is given by fxsirri) + b, 
and we test Ho : \i = against the alternative, 
fj, > q Both s and b may depend on addi- 
tional nuisance parameters which are omitted 
from the notation. The mass m is another nui- 
sance parameter, but it does not exist under Hq 
since b does not depend on m. The testing prob- 
lem has therefore a nuisance parameter that is 
present only under the alternative. When search- 
ing for a resonance that can appear anywhere in 
the range, one will look for the largest excess of 
events above the background. More precisely, if 
q(m) is a test statistic for a fixed mass, and large 
values of q(m) indicate decreasing compatibility 
with the null hypothesis, then the test statistic 
of the entire range would be q(rh) — max[g(m)]. 

m 

The problem is therefore in assessing the tail 
probability (p-value) of the maximum of q(m) 
over the search range. 

In section [2] we review the main result of 
[4] and its application to the present case. We 
propose a practical procedure for estimating the 
p-value, which we then demonstrate with a toy 
model simulation in section [3l 



2 Tail probabilities of the likelihood 
ratio test statistic 

Suppose that a signal hypothesis depends on a 
nuisance parameter 9 that does not exist under 
the null. We denote by q(8) the profile likelihood 
test statistic for some fixed 9 and we assume that 
it follows a x 2 distribution with s degrees of free- 
dom, as would be the case in the large sample 
limit when Wilks' theorem holds. We are inter- 
ested in the tail probability of the maximum of 



2 Strictly speaking, Wilks' theorem does not apply 
here since the tested value of /i is on the boundary 
of the allowed region. This difficulty however can be 
easily alleviated by extending the allowed region of /i 
to negative values and then reducing the p-value by 
half. A formal generalization of this nature is given 
in0. 



q(9) over 9, which we denote by q(9). As shown 
in 0], this is bounded by: 

P(q(9) > c) < P(xl >c) + (N(c)) (1) 

where N(c) is the number of 'upcrossings' of 
the level c by the process q(9), with an expecta- 
tion that is given by [I]: 



(N(c)) 



„(a-1)/2„-c/2 



0F2 s / 2 r(s/2 + 1/2) J L 



C(8)d8 (2) 



Where [L, U) is the range of possible values 
of 9, and C{9) is some function that depends on 
the details of the statistical model. To have the 
maximum of q{9) above the level c means that 
either the value at the lower threshold q(L) is 
larger than c, or that there is at least one up- 
crossing, hence the two terms in ([T]). Note that 
the bound is expected to become an equality for 
large values of c, as the expected number of up- 
crossings will be dominated by the probability of 
one upcrossing, that is when (N(c)) <C 1. 

The function C(9) can in general be diffi- 
cult to calculate. Instead, we propose to estimate 
(N(cq)) at some low reference level cq by simply 
counting the number of upcrossings in a small 
set of background-only Monte Carlo simulations. 
Eq. (TTJ) then becomes 

P(q(9) > c) < P{ X 2 S > c )+<Ar(c )}(-) (s - 1)/2 e-( c - c °>/ 2 

co 

(3) 
and so once (N(cq)) is determined, the p-value 
of a given observation and the corresponding sig- 
nificance are readily obtained. 

Naturally, one would like to choose the ref- 
erence level Co in a way that will minimize the 
resulting uncertainty on the bound. From (|3|), 
it can be seen that the statistical uncertainty is 
proportional to <tn/(N) where on is the stan- 
dard deviation of N. As long as the the distri- 
bution of N is 'well behaved' in the sense that 
the relative uncertainty decreases with increas- 
ing (N) , then the optimal choice of Co would be 
that which maximizes the expected number of 
upcrossings ©, that is Co = s — 1. Note that in 
the case s = 1, the maximal number of upcross- 
ings occurs at cq —^ 0, however the ability to 
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reliably estimate (N(cq)) at very low levels de- 
pends on the numerical resolution at which q(9) 
is calculated. In this case, one should therefore 
aim at having Co as low as possible but still sig- 
nificantly larger than the numerical resolution of 
q(9), and the typical distance between upcross- 
ings should be kept significantly larger than the 
9 resolution. In the example considered in the 
following section cq = 0.5 satisfies those condi- 
tions and proves to be a good choice. 

It is further interesting to note that the de- 
pendence of (N(c)) on c in @, is the same as the 
asymptotic form of the cumulative distribution 
of a x 2 variable with s + 1 degrees of freedom, 



of the allowed region (assuming, from symme- 
try reasons, that each edge has a probability 1/2 
of being a local maximum) . The number Af can 
therefore be interpreted as an 'effective number' 
of independent search regions. 

We remark that exactly such an intuitive rea- 
soning was employed by Quayle [8] as a conjec- 
ture for the distribution of q(0). It was found 
that the distribution of a random variable de- 
fined according to (JTJ) reproduces, to a good ap- 
proximation, that of q{9), and that the agree- 
ment is better at the tail. As shown above, this 
behavior is expected and the conjecture is in fact 
a limiting case of Davies' formula ([TJ. 



P(X 2 B+ 1 > c) 



c 4(«+l)-l e -c/2 



c^oo 2^ s+1 )- 1 r(( s + i)/2) 

allowing us to write, for large c (c> s), 

P(q(9) > c) « P{xl > c) +MP(xl+i > c) (5) 
where 



(4) 2.1 Trial factors 



N 



C(0)d0 



(0) 



'21TJL 

And the bound has been replaced by a '«' 
sign since we are dealing with the large c limit. 
The probability described by eq. (JSJ) has a nat- 
ural interpretation. It is the same as one would 
have for a random variable that is the maximal 
of n independent xl+i variates and one x 2 s vac- 
ate, with E[n] = Af. That is, if 



y = max[s ,ii...in] 



(7) 



and 



a^o ~x: 



V 2 

Xs+l 



i = l...n, E[n]=Af 

then the tail probability P(y > c) for large c 
is given by the right hand side of J5J. 

Intuitively, this suggests that we can view the 
range of 6 as being composed of several (on aver- 
age Af) independent regions , where in each one 
the likelihood fit involves an extra degree of free- 
dom due to the variable mass, leading to a xt+i 
distribution. The x 2 term accounts for the pos- 
sibility of having a local maximum on the edge 



It is sometimes useful to describe the look-elsewhere 
effect in terms of a trial factor, which is the ratio 
between the probability of observing the excess 
at some fixed mass point, to the probability of 
observing it anywhere in the range. From ([5]), we 
have 



trialjf = 



P(q(0) > c) 
P(q(9) > c) 

l+M p (x 2 s+ i 



>c 



P(X 2 > c) 
T(s/2) 



-AfJ C , 



2T((s + l)/2) 



(8) 

(9) 

(10) 



For the case s — 1, y/c is just the 'fixed' 
significance, that is the quantile of a standard 
gaussian corresponding to a p- value of chi-square 
with s degrees of freedom. This also holds asymp- 
totically for s > 1, as Z fix = y/c + 0(s 1 ^^-^ 
We therefore have, for c ^> s, 



V~c 



'"■"*" 1 + T^ z "-r^TW) (u) 

which, for the common case of s = 1, is 



trial# s=1 « 1 + \/-AfZ flx (12) 
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The trial factor is thus asymptotically linear 
with both the effective number of independent 
regions, and with the fixed-mass significance. 



3 Toy model simulations 

We shall now illustrate the procedure described 
above with a simple example. Our toy model 
consists of a gaussian signal ('mass bump') on 
top of a continuous background that follows a 
Rayleigh distribution, in a mass range [0,120]. 
The width of the gaussian increases linearly with 
the mass, representing a case where the mass res- 
olution changes with the mass. 

We assume that the background shape is known 
but its normalization is not, so that it is a free 
parameter in the fit (i.e. a nuisance parameter), 
together with the signal location and normaliza- 
tion. We use a binned profile likelihood ratio as 
our test statistic, where the number of events in 
each bin is assumed to be Poisson distributed 
with an expected value 



Carlo simulations of background experiments. This 
gives (N(co)) = 4.34 ± 0.11, which corresponds 
to M = 5.58 ± 0.14. The distribution of q{rh) 
is then estimated from a sample of ~1 million 
background simulations and we compare the tail 
probabilities to the prediction of © ■ The results 
are shown in Figj2] The bound of ([3]) gives an ex- 
cellent approximation to the observed p-values 
for large c. 



E(m) = fiSi(m) + (Ih 



(13) 



where /j, is the signal strength parameter, Si(m) 
corresponds to a gaussian located at a mass m, j3 
is the background normalization and bi are fixed 
and given by the Rayleigh distribution. For sim- 
plicity of notation we will use in the following 
s = {s^ and b = {ftbi}. The hypothesis that no 
signal exists, or equivalently that fi — 0, will be 
referred to as the null hypothesis, Hq. p, and b 
will denote maximum likelihood estimators while 
b will denote the conditional maximum likeli- 
hood estimator of the background normalization 
under the null hypothesis. 

The test statistic q(m) is defined as: 



q{m) = —2ln 



£(b) 



£(ps(m) + b) 



(14) 



where C is the likelihood function. An example 
background-only pseudo-experiment is shown in 
FiglU together with q(m). 

We choose a reference level of cq = 0.5 and 
estimate the expected number of upcrossings, as 
demonstrated in FigHJ from a set of 100 Monte 





Fig. 2. (top) Distribution of q(rh). (bottom) Tail 
probability of q(rh). The solid line shows the result 
of the Monte Carlo simulation, the dotted red line 
is the predicted bound (eq. [3]) with the estimated 
(N(co)) (see text). The yellow band represents the 
statistical uncertainty due to the limited sample size. 
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Fig. 1. (top) An example pseudo-experiment with background only. The solid line shows the best signal 
fit, while the dotted line shows the background fit. (bottom) The likelihood ratio test statistic q(m). The 
dotted line marks the reference level Co with the upcrossings marked by the dark dots. Note the broadening 
of the fluctuations as m increases, reflecting the increase in the signal gaussian width. 



Figure [3] shows the corresponding trial factor, 
compared to the bound calculated from eq.© 
and the asymptotic approximation of eq. (1121) . 




Fig. 3. The trial factor estimated from toy Monte 
Carlo simulations (solid line) , with the upper bound 
of eq.© (dotted black line) and the asymptotic ap- 
proximation of eq. (|12p (dotted red line). The yellow 
band represents the statistical uncertainty due to the 
limited sample size. 



We consider in addition a case where the 
number of degrees of freedom is more than one. 
For this purpose, we assume several indepen- 
dent channels, each identical to the one described 
above, and where the signal normalizations (/ii, ... 
are free parameters. (This could represent, for 
example, a case where one is searching for a res- 
onance in several decay channels, with unknown 
branching ratios). The reference level is chosen 
to be Co = s — 1 as discussed in the previous sec- 
tion. The resulting distributions and trial factors 
for s = 2,3 are shown in figures [4] and [5] As be- 
fore, the the bound ^ agrees with the observed 
p-value, within statistical variation. The rate at 
which the asymptotic approximation (jlip con- 
verges to the bound becomes slower when the 
number of degrees of freedom increases, mak- 
ing it less accurate, however the trend of linear 
growth is evident. 



4 Conclusions 

The look-elsewhere effect presents a case when 
the standard regularity conditions of Wilks' the- 



Vs) 
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Fig. 5. The trial factors estimated from toy Monte 

Carlo simulations (solid line), with the upper bound 

of eq.([3]) (dotted black line) and the asymptotic ap- 
Fig. 4. (top) Distribution of q(m) for s = 2, 3. (bot- proximation of eq .(nj (dotted red line) . The yellow 

tom)Tail probability of q(m) . The solid lines shows band represents the statistical uncer tainty due to the 
the result of the Monte Carlo simulation, the dotted limited samnle size 
red lines are the predicted bound (eq. [3} with the 
estimated (iV(co)) (see text). 



orem do not apply, and so specialized methods 
are required for estimating tail probabilities in 
the large sample limit. Such methods however 
exist, and as we have demonstrated, can provide 
accurate results under fairly general conditions. 
The procedure described in this paper consists of 
estimating the expected number of upcrossings 
of the likelihood ratio at a low reference level us- 
ing a small set of Monte Carlo simulations. This 
can then be related to the expected number of 
upcrossings at a higher level using Davies' result 
([2]), providing a bound on the probability of the 
likelihood ratio exceeding this level, given by ([3]). 
The method is easy to implement in practical sit- 
uations, and the bound converges to the actual 



tail probability when this probability becomes 
small. It has further been shown that the trial 
factor is asymptotically proportional to the effec- 
tive number of independent search regions and 
to the fixed-mass significance, allowing for a sim- 
ple interpretation of the effect as being the result 
of two factors: the first one is the mere fact that 
there are more available distinct regions wherein 
fluctuations can occur, represented by the effec- 
tive number of independent regions; the second 
effect is that within each region we further max- 
imize the likelihood by fitting the mass in the 
neighborhood of the fluctuation, which can be 
described by adding a degree of freedom to the 
fit. 
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